Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 13 de 13
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Clin Epigenetics ; 14(1): 161, 2022 12 02.
Artigo em Inglês | MEDLINE | ID: mdl-36461044

RESUMO

BACKGROUND: Parent of origin-specific allelic expression of imprinted genes is epigenetically controlled. In cancer, imprinted genes undergo both genomic and epigenomic alterations, including frequent copy number changes. We investigated whether copy number loss or gain of imprinted genes in cancer cell lines is associated with response to chemotherapy treatment. RESULTS: We analyzed 198 human imprinted genes including protein-coding genes and noncoding RNA genes using data from tumor cell lines from the Cancer Cell Line Encyclopedia and Genomics of Drug Sensitivity in Cancer datasets. We examined whether copy number of the imprinted genes in 35 different genome locations was associated with response to cancer drug treatment. We also analyzed associations of pretreatment expression and DNA methylation of imprinted genes with drug response. Higher copy number of BLCAP, GNAS, NNAT, GNAS-AS1, HM13, MIR296, MIR298, and PSIMCT-1 in the chromosomal region 20q11-q13.32 was associated with resistance to multiple antitumor agents. Increased expression of BLCAP and HM13 was also associated with drug resistance, whereas higher methylation of gene regions of BLCAP, NNAT, SGK2, and GNAS was associated with drug sensitivity. While expression and methylation of imprinted genes in several other chromosomal regions was also associated with drug response and many imprinted genes in different chromosomal locations showed a considerable copy number variation, only imprinted genes at 20q11-q13.32 had a consistent association of their copy number with drug response. Copy number values among the imprinted genes in the 20q11-q13.32 region were strongly correlated. They were also correlated with the copy number of cancer-related non-imprinted genes MYBL2, AURKA, and ZNF217 in that chromosomal region. Expression of genes at 20q11-q13.32 was associated with ex vivo drug response in primary tumor samples from the Beat AML 1.0 acute myeloid leukemia patient cohort. Association of the increased copy number of the 20q11-q13.32 region with drug resistance may be complex and could involve multiple genes. CONCLUSIONS: Copy number of imprinted and non-imprinted genes in the chromosomal region 20q11-q13.32 was associated with cancer drug resistance. The genes in this chromosomal region may have a modulating effect on tumor response to chemotherapy.


Assuntos
Antineoplásicos , MicroRNAs , Neoplasias , Humanos , Variações do Número de Cópias de DNA , Metilação de DNA , Antineoplásicos/farmacologia , Antineoplásicos/uso terapêutico , Resistencia a Medicamentos Antineoplásicos/genética , Linhagem Celular Tumoral , Neoplasias/tratamento farmacológico , Neoplasias/genética
2.
Sci Rep ; 11(1): 17275, 2021 08 26.
Artigo em Inglês | MEDLINE | ID: mdl-34446762

RESUMO

TP53 is one of the most frequently altered genes in cancer; it can be inactivated by a number of different mechanisms. NM_000546.6 (ENST00000269305.9) is by far the predominant TP53 isoform, however a few other alternative isoforms have been described to be expressed at much lower levels. To better understand patterns of TP53 alternative isoforms expression in cancer and normal samples we performed exon-exon junction reads based analysis of TP53 isoforms using RNA-seq data from The Cancer Genome Atlas (TCGA), Cancer Cell Line Encyclopedia (CCLE), and Genotype-Tissue Expression (GTEx) project. TP53 C-terminal alternative isoforms have abolished or severely decreased tumor suppressor activity, and therefore, an increase in fraction of TP53 C-terminal alternative isoforms may be expected in tumors with wild type TP53. Despite our expectation that there would be increase of fraction of TP53 C-terminal alternative isoforms, we observed no substantial increase in fraction of TP53 C-terminal alternative isoforms in TCGA tumors and CCLE cancer cell lines with wild type TP53, likely indicating that TP53 C-terminal alternative isoforms expression cannot be reliably selected for during tumor progression.


Assuntos
Processamento Alternativo , Éxons/genética , Regulação Neoplásica da Expressão Gênica , Mutação , Neoplasias/genética , Proteína Supressora de Tumor p53/genética , Linhagem Celular Tumoral , Progressão da Doença , Genes Supressores de Tumor , Humanos , Neoplasias/metabolismo , Neoplasias/patologia , Isoformas de Proteínas/genética , Isoformas de Proteínas/metabolismo , RNA-Seq/métodos , Proteína Supressora de Tumor p53/metabolismo
3.
Clin Epigenetics ; 13(1): 49, 2021 03 06.
Artigo em Inglês | MEDLINE | ID: mdl-33676569

RESUMO

BACKGROUND: Altered DNA methylation patterns play important roles in cancer development and progression. We examined whether expression levels of genes directly or indirectly involved in DNA methylation and demethylation may be associated with response of cancer cell lines to chemotherapy treatment with a variety of antitumor agents. RESULTS: We analyzed 72 genes encoding epigenetic factors directly or indirectly involved in DNA methylation and demethylation processes. We examined association of their pretreatment expression levels with methylation beta-values of individual DNA methylation probes, DNA methylation averaged within gene regions, and average epigenome-wide methylation levels. We analyzed data from 645 cancer cell lines and 23 cancer types from the Cancer Cell Line Encyclopedia and Genomics of Drug Sensitivity in Cancer datasets. We observed numerous correlations between expression of genes encoding epigenetic factors and response to chemotherapeutic agents. Expression of genes encoding a variety of epigenetic factors, including KDM2B, DNMT1, EHMT2, SETDB1, EZH2, APOBEC3G, and other genes, was correlated with response to multiple agents. DNA methylation of numerous target probes and gene regions was associated with expression of multiple genes encoding epigenetic factors, underscoring complex regulation of epigenome methylation by multiple intersecting molecular pathways. The genes whose expression was associated with methylation of multiple epigenome targets encode DNA methyltransferases, TET DNA methylcytosine dioxygenases, the methylated DNA-binding protein ZBTB38, KDM2B, SETDB1, and other molecular factors which are involved in diverse epigenetic processes affecting DNA methylation. While baseline DNA methylation of numerous epigenome targets was correlated with cell line response to antitumor agents, the complex relationships between the overlapping effects of each epigenetic factor on methylation of specific targets and the importance of such influences in tumor response to individual agents require further investigation. CONCLUSIONS: Expression of multiple genes encoding epigenetic factors is associated with drug response and with DNA methylation of numerous epigenome targets that may affect response to therapeutic agents. Our findings suggest complex and interconnected pathways regulating DNA methylation in the epigenome, which may both directly and indirectly affect response to chemotherapy.


Assuntos
Antineoplásicos/uso terapêutico , Biomarcadores Farmacológicos/metabolismo , Linhagem Celular/metabolismo , Neoplasias/genética , Desaminase APOBEC-3G , Linhagem Celular/efeitos dos fármacos , DNA (Citosina-5-)-Metiltransferase 1 , Metilação de DNA , Proteínas de Ligação a DNA/genética , Dioxigenases/genética , Proteína Potenciadora do Homólogo 2 de Zeste , Epigenoma , Epigenômica , Proteínas F-Box , Regulação Neoplásica da Expressão Gênica/genética , Antígenos de Histocompatibilidade , Histona-Lisina N-Metiltransferase , Humanos , Histona Desmetilases com o Domínio Jumonji , Neoplasias/tratamento farmacológico , Regiões Promotoras Genéticas , Proteínas Repressoras
4.
Bioinformatics ; 37(18): 3026-3028, 2021 09 29.
Artigo em Inglês | MEDLINE | ID: mdl-33714997

RESUMO

SUMMARY: In this article, we introduce a hierarchical clustering and Gaussian mixture model with expectation-maximization (EM) algorithm for detecting copy number variants (CNVs) using whole exome sequencing (WES) data. The R shiny package 'HCMMCNVs' is also developed for processing user-provided bam files, running CNVs detection algorithm and conducting visualization. Through applying our approach to 325 cancer cell lines in 22 tumor types from Cancer Cell Line Encyclopedia (CCLE), we show that our algorithm is competitive with other existing methods and feasible in using multiple cancer cell lines for CNVs estimation. In addition, by applying our approach to WES data of 120 oral squamous cell carcinoma (OSCC) samples, our algorithm, using the tumor sample only, exhibits more power in detecting CNVs as compared with the methods using both tumors and matched normal counterparts. AVAILABILITY AND IMPLEMENTATION: HCMMCNVs R shiny software is freely available at github repository https://github.com/lunching/HCMM_CNVs.and Zenodo https://doi.org/10.5281/zenodo.4593371. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Carcinoma de Células Escamosas , Neoplasias Bucais , Humanos , Sequenciamento do Exoma , Variações do Número de Cópias de DNA , Neoplasias Bucais/genética , Software , Algoritmos , Análise por Conglomerados
5.
Hum Mutat ; 42(4): 342-345, 2021 04.
Artigo em Inglês | MEDLINE | ID: mdl-33600011

RESUMO

Splice site variants may lead to transcript alterations, causing exons inclusion, exclusion, truncation, or intron retention. Interpreting the consequences of a specific splice site variant is not straightforward, especially if the variant is located outside of the canonical splice sites. We developed MutSpliceDB: https://brb.nci.nih.gov/splicing, a public resource to facilitate the interpretation of splice sites variants effects on splicing based on manually reviewed RNA-seq BAM files from samples with splice site variants.


Assuntos
Sítios de Splice de RNA , Splicing de RNA , Processamento Alternativo , Éxons/genética , Humanos , Íntrons/genética , Sítios de Splice de RNA/genética , Splicing de RNA/genética , RNA-Seq
6.
Clin Epigenetics ; 12(1): 93, 2020 06 25.
Artigo em Inglês | MEDLINE | ID: mdl-32586373

RESUMO

BACKGROUND: Small cell lung cancer (SCLC) is an aggressive neuroendocrine lung cancer. SCLC progression and treatment resistance involve epigenetic processes. However, links between SCLC DNA methylation and drug response remain unclear. We performed an epigenome-wide study of 66 human SCLC cell lines using the Illumina Infinium MethylationEPIC BeadChip array. Correlations of SCLC DNA methylation and gene expression with in vitro response to 526 antitumor agents were examined. RESULTS: We found multiple significant correlations between DNA methylation and chemosensitivity. A potentially important association was observed for TREX1, which encodes the 3' exonuclease I that serves as a STING antagonist in the regulation of a cytosolic DNA-sensing pathway. Increased methylation and low expression of TREX1 were associated with the sensitivity to Aurora kinase inhibitors AZD-1152, SCH-1473759, SNS-314, and TAK-901; the CDK inhibitor R-547; the Vertex ATR inhibitor Cpd 45; and the mitotic spindle disruptor vinorelbine. Compared with cell lines of other cancer types, TREX1 had low mRNA expression and increased upstream region methylation in SCLC, suggesting a possible relationship with SCLC sensitivity to Aurora kinase inhibitors. We also identified multiple additional correlations indicative of potential mechanisms of chemosensitivity. Methylation of the 3'UTR of CEP350 and MLPH, involved in centrosome machinery and microtubule tracking, respectively, was associated with response to Aurora kinase inhibitors and other agents. EPAS1 methylation was associated with response to Aurora kinase inhibitors, a PLK-1 inhibitor and a Bcl-2 inhibitor. KDM1A methylation was associated with PLK-1 inhibitors and a KSP inhibitor. Increased promoter methylation of SLFN11 was correlated with resistance to DNA damaging agents, as a result of low or no SLFN11 expression. The 5' UTR of the epigenetic modifier EZH2 was associated with response to Aurora kinase inhibitors and a FGFR inhibitor. Methylation and expression of YAP1 were correlated with response to an mTOR inhibitor. Among non-neuroendocrine markers, EPHA2 was associated with response to Aurora kinase inhibitors and a PLK-1 inhibitor and CD151 with Bcl-2 inhibitors. CONCLUSIONS: Multiple associations indicate potential epigenetic mechanisms affecting SCLC response to chemotherapy and suggest targets for combination therapies. While many correlations were not specific to SCLC lineages, several lineage markers were associated with specific agents.


Assuntos
Linhagem Celular Tumoral/efeitos dos fármacos , Metilação de DNA/genética , Epigenoma/genética , Carcinoma de Pequenas Células do Pulmão/genética , Antineoplásicos/farmacologia , Antineoplásicos Fitogênicos/farmacologia , Aurora Quinases/antagonistas & inibidores , Proteínas de Ciclo Celular/antagonistas & inibidores , Proteínas Inibidoras de Quinase Dependente de Ciclina/farmacologia , Metilação de DNA/efeitos dos fármacos , Quimioterapia Combinada/estatística & dados numéricos , Exodesoxirribonucleases/genética , Exodesoxirribonucleases/metabolismo , Expressão Gênica/efeitos dos fármacos , Expressão Gênica/genética , Regulação Neoplásica da Expressão Gênica/efeitos dos fármacos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Histona Desmetilases/efeitos dos fármacos , Histona Desmetilases/genética , Humanos , Neoplasias Pulmonares/patologia , Proteínas de Membrana/antagonistas & inibidores , Proteínas Nucleares/efeitos dos fármacos , Proteínas Nucleares/genética , Fosfoproteínas/genética , Proteínas Serina-Treonina Quinases/antagonistas & inibidores , Proteínas Proto-Oncogênicas/antagonistas & inibidores , Proteínas Proto-Oncogênicas c-bcl-2/antagonistas & inibidores , Carcinoma de Pequenas Células do Pulmão/diagnóstico , Quinase 1 Polo-Like
7.
Cancer Genet ; 237: 19-38, 2019 09.
Artigo em Inglês | MEDLINE | ID: mdl-31447063

RESUMO

Folate-mediated one-carbon metabolism is essential for growth and survival of cancer cells. We investigated whether the response of cancer cells to antitumor treatment may be partially influenced by variation in expression of one-carbon metabolism genes. We used cancer cell line information from the Cancer Cell Line Encyclopedia and the Genomics of Drug Sensitivity in Cancer resources to examine whether variation in pretreatment expression of one-carbon metabolism-related genes was associated with response to treatment. GART, TYMS, SHMT2, MTR, ALDH2, BHMT, MAT2B, MTHFD2, NNMT, and SLC46A1 showed modest statistically significant correlations with response to a variety of antitumor agents. Higher expression levels of SLC46A1 were associated with resistance to multiple agents, whereas elevated expression of GART, TYMS, SHMT2, MTR, BHMT, and MAT2B was associated with chemosensitivity to multiple drugs. NNMT expression was bimodally distributed and showed different directions of association with various agents. Correlation of increased NNMT expression with sensitivity to dasatinib was validated in the NCI-60 cancer cell line panel. Pretreatment expression levels were correlated among many one-carbon metabolism genes. Expression of several folate genes was strongly associated with expression of multiple components of drug target pathways. Molecular mechanisms underlying associations of one-carbon metabolism gene with drug response require further investigation.


Assuntos
Antineoplásicos/uso terapêutico , Carbono/metabolismo , Ácido Fólico/metabolismo , Neoplasias/tratamento farmacológico , Neoplasias/genética , Transcrição Gênica , Linhagem Celular Tumoral , Perfilação da Expressão Gênica , Humanos
8.
Hum Genomics ; 12(1): 20, 2018 04 11.
Artigo em Inglês | MEDLINE | ID: mdl-29642934

RESUMO

BACKGROUND: The APOBEC gene family of cytidine deaminases plays important roles in DNA repair and mRNA editing. In many cancers, APOBEC3B increases the mutation load, generating clusters of closely spaced, single-strand-specific DNA substitutions with a characteristic hypermutation signature. Some studies also suggested a possible involvement of APOBEC3A, REV1, UNG, and FHIT in molecular processes affecting APOBEC mutagenesis. It is important to understand how mutagenic processes linked to the activity of these genes may affect sensitivity of cancer cells to treatment. RESULTS: We used information from the Cancer Cell Line Encyclopedia and the Genomics of Drug Sensitivity in Cancer resources to examine associations of the prevalence of APOBEC-like motifs and mutational loads with expression of APOBEC3A, APOBEC3B, REV1, UNG, and FHIT and with cell line chemosensitivity to 255 antitumor drugs. Among the five genes, APOBEC3B expression levels were bimodally distributed, whereas expression of APOBEC3A, REV1, UNG, and FHIT was unimodally distributed. The majority of the cell lines had low levels of APOBEC3A expression. The strongest correlations of gene expression levels with mutational loads or with measures of prevalence of APOBEC-like motif counts and kataegis clusters were observed for REV1, UNG, and APOBEC3A. Sensitivity or resistance of cell lines to JQ1, palbociclib, bicalutamide, 17-AAG, TAE684, MEK inhibitors refametinib, PD-0325901, and trametinib and a number of other agents was correlated with candidate gene expression levels or with abundance of APOBEC-like motif clusters in specific cancers or across cancer types. CONCLUSIONS: We observed correlations of expression levels of the five candidate genes in cell line models with sensitivity to cancer drug treatment. We also noted suggestive correlations between measures of abundance of APOBEC-like sequence motifs with drug sensitivity in small samples of cell lines from individual cancer categories, which require further validation in larger datasets. Molecular mechanisms underlying the links between the activities of the products of each of the five genes, the resulting mutagenic processes, and sensitivity to each category of antitumor agents require further investigation.


Assuntos
Resistencia a Medicamentos Antineoplásicos/genética , Neoplasias/tratamento farmacológico , Neoplasias/genética , Hidrolases Anidrido Ácido/genética , Antineoplásicos/uso terapêutico , Linhagem Celular Tumoral , Citidina Desaminase/genética , Regulação Neoplásica da Expressão Gênica/genética , Humanos , Antígenos de Histocompatibilidade Menor/genética , Proteínas de Neoplasias/genética , Neoplasias/patologia , Proteínas Nucleares/genética , Nucleotidiltransferases/genética , Proteínas/genética
9.
Hum Mutat ; 38(11): 1449-1453, 2017 11.
Artigo em Inglês | MEDLINE | ID: mdl-28762582

RESUMO

Tumor-suppressor genes can be inactivated by several mechanisms and, in a majority of cases, both alleles need to be affected. One of the mechanisms of inactivation is due to deletions ranging from dozen to hundreds of nucleotides; such deletions are often missed by variant callers. HomDelDetect is a method to detect such homozygous deletions in cancer models, such as cancer cell lines and potentially patient tumor-derived xenografts. This method can be applied to partial exome, whole-exome sequencing, whole-genome sequencing, and RNA-seq data. We applied our method across a panel of CCLE cancer cell lines and observed good concordance with SNP array-based analysis and also detected deletions that have been missed by variant callers and by SNP arrays, demonstrating the ability of HomDelDetect to improve the annotations of tumor-suppressor genes in cancer models.


Assuntos
Genes Supressores , Homozigoto , Modelos Biológicos , Neoplasias/genética , Deleção de Sequência , Linhagem Celular Tumoral , Exoma , Expressão Gênica , Inativação Gênica , Genômica/métodos , Genótipo , Humanos , Neoplasias/diagnóstico , Análise de Sequência com Séries de Oligonucleotídeos , Sequenciamento do Exoma
10.
BMC Syst Biol ; 10 Suppl 3: 62, 2016 08 26.
Artigo em Inglês | MEDLINE | ID: mdl-27587275

RESUMO

BACKGROUND: The high degree of heterogeneity observed in breast cancers makes it very difficult to classify the cancer patients into distinct clinical subgroups and consequently limits the ability to devise effective therapeutic strategies. Several classification strategies based on ER/PR/HER2 expression or the expression profiles of a panel of genes have helped, but such methods often produce misleading results due to their dynamic nature. In contrast, somatic DNA mutations are relatively stable and lead to initiation and progression of many sporadic cancers. Hence in this study, we explore the use of gene mutation profiles to classify, characterize and predict the subgroups of breast cancers. RESULTS: We analyzed the whole exome sequencing data from 358 ethnically similar breast cancer patients in The Cancer Genome Atlas (TCGA) project. Somatic and non-synonymous single nucleotide variants identified from each patient were assigned a quantitative score (C-score) that represents the extent of negative impact on the gene function. Using these scores with non-negative matrix factorization method, we clustered the patients into three subgroups. By comparing the clinical stage of patients, we identified an early-stage-enriched and a late-stage-enriched subgroup. Comparison of the mutation scores of early and late-stage-enriched subgroups identified 358 genes that carry significantly higher mutations rates in the late stage subgroup. Functional characterization of these genes revealed important functional gene families that carry a heavy mutational load in the late state rich subgroup of patients. Finally, using the identified subgroups, we also developed a supervised classification model to predict the stage of the patients. CONCLUSIONS: This study demonstrates that gene mutation profiles can be effectively used with unsupervised machine-learning methods to identify clinically distinguishable breast cancer subgroups. The classification model developed in this method could provide a reasonable prediction of the cancer patients' stage solely based on their mutation profiles. This study represents the first use of only somatic mutation profile data to identify and predict breast cancer subgroups and this generic methodology can also be applied to other cancer datasets.


Assuntos
Neoplasias da Mama/genética , Genômica/métodos , Aprendizado de Máquina , Mutação , Análise por Conglomerados , Humanos
11.
PLoS One ; 10(3): e0119383, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-25803781

RESUMO

Breast cancers exhibit highly heterogeneous molecular profiles. Although gene expression profiles have been used to predict the risks and prognostic outcomes of breast cancers, the high variability of gene expression limits its clinical application. In contrast, genetic mutation profiles would be more advantageous than gene expression profiles because genetic mutations can be stably detected and the mutational heterogeneity widely exists in breast cancer genomes. We analyzed 98 breast cancer whole exome samples that were sorted into three subtypes, two grades and two stages. The sum deleterious effect of all mutations in each gene was scored to identify differentially mutated genes (DMGs) for this case-control study. DMGs were corroborated using extensive published knowledge. Functional consequences of deleterious SNVs on protein structure and function were also investigated. Genes such as ERBB2, ESP8, PPP2R4, KIAA0922, SP4, CENPJ, PRCP and SELP that have been experimentally or clinically verified to be tightly associated with breast cancer prognosis are among the DMGs identified in this study. We also identified some genes such as ARL6IP5, RAET1E, and ANO7 that could be crucial for breast cancer development and prognosis. Further, SNVs such as rs1058808, rs2480452, rs61751507, rs79167802, rs11540666, and rs2229437 that potentially influence protein functions are observed at significantly different frequencies in different comparison groups. Protein structure modeling revealed that many non-synonymous SNVs have a deleterious effect on protein stability, structure and function. Mutational profiling at gene- and SNV-level revealed differential patterns within each breast cancer comparison group, and the gene signatures correlate with expected prognostic characteristics of breast cancer classes. Some of the genes and SNVs identified in this study show high promise and are worthy of further investigation by experimental studies.


Assuntos
Neoplasias da Mama/genética , Exoma , Mutação , Adulto , Neoplasias da Mama/patologia , Estudos de Casos e Controles , Feminino , Estudos de Associação Genética , Humanos , Gradação de Tumores , Estadiamento de Neoplasias , Prognóstico , Análise de Sequência de DNA , Transcriptoma
12.
BMC Bioinformatics ; 14: 96, 2013 Mar 15.
Artigo em Inglês | MEDLINE | ID: mdl-23496846

RESUMO

BACKGROUND: In protein sequence classification, identification of the sequence motifs or n-grams that can precisely discriminate between classes is a more interesting scientific question than the classification itself. A number of classification methods aim at accurate classification but fail to explain which sequence features indeed contribute to the accuracy. We hypothesize that sequences in lower denominations (n-grams) can be used to explore the sequence landscape and to identify class-specific motifs that discriminate between classes during classification. Discriminative n-grams are short peptide sequences that are highly frequent in one class but are either minimally present or absent in other classes. In this study, we present a new substitution-based scoring function for identifying discriminative n-grams that are highly specific to a class. RESULTS: We present a scoring function based on discriminative n-grams that can effectively discriminate between classes. The scoring function, initially, harvests the entire set of 4- to 8-grams from the protein sequences of different classes in the dataset. Similar n-grams of the same size are combined to form new n-grams, where the similarity is defined by positive amino acid substitution scores in the BLOSUM62 matrix. Substitution has resulted in a large increase in the number of discriminatory n-grams harvested. Due to the unbalanced nature of the dataset, the frequencies of the n-grams are normalized using a dampening factor, which gives more weightage to the n-grams that appear in fewer classes and vice-versa. After the n-grams are normalized, the scoring function identifies discriminative 4- to 8-grams for each class that are frequent enough to be above a selection threshold. By mapping these discriminative n-grams back to the protein sequences, we obtained contiguous n-grams that represent short class-specific motifs in protein sequences. Our method fared well compared to an existing motif finding method known as Wordspy. We have validated our enriched set of class-specific motifs against the functionally important motifs obtained from the NLSdb, Prosite and ELM databases. We demonstrate that this method is very generic; thus can be widely applied to detect class-specific motifs in many protein sequence classification tasks. CONCLUSION: The proposed scoring function and methodology is able to identify class-specific motifs using discriminative n-grams derived from the protein sequences. The implementation of amino acid substitution scores for similarity detection, and the dampening factor to normalize the unbalanced datasets have significant effect on the performance of the scoring function. Our multipronged validation tests demonstrate that this method can detect class-specific motifs from a wide variety of protein sequence classes with a potential application to detecting proteome-specific motifs of different organisms.


Assuntos
Motivos de Aminoácidos , Proteínas/classificação , Análise de Sequência de Proteína/métodos , Substituição de Aminoácidos , Mineração de Dados , Enzimas/química , Enzimas/classificação
13.
BMC Res Notes ; 5: 351, 2012 Jul 10.
Artigo em Inglês | MEDLINE | ID: mdl-22780965

RESUMO

BACKGROUND: Understanding protein subcellular localization is a necessary component toward understanding the overall function of a protein. Numerous computational methods have been published over the past decade, with varying degrees of success. Despite the large number of published methods in this area, only a small fraction of them are available for researchers to use in their own studies. Of those that are available, many are limited by predicting only a small number of organelles in the cell. Additionally, the majority of methods predict only a single location for a sequence, even though it is known that a large fraction of the proteins in eukaryotic species shuttle between locations to carry out their function. FINDINGS: We present a software package and a web server for predicting the subcellular localization of protein sequences based on the ngLOC method. ngLOC is an n-gram-based Bayesian classifier that predicts subcellular localization of proteins both in prokaryotes and eukaryotes. The overall prediction accuracy varies from 89.8% to 91.4% across species. This program can predict 11 distinct locations each in plant and animal species. ngLOC also predicts 4 and 5 distinct locations on gram-positive and gram-negative bacterial datasets, respectively. CONCLUSIONS: ngLOC is a generic method that can be trained by data from a variety of species or classes for predicting protein subcellular localization. The standalone software is freely available for academic use under GNU GPL, and the ngLOC web server is also accessible at http://ngloc.unmc.edu.


Assuntos
Internet , Proteínas/metabolismo , Software , Frações Subcelulares/metabolismo , Teorema de Bayes , Células Eucarióticas/metabolismo , Células Procarióticas/metabolismo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...